Avoiding the Look-Ahead Pathology of Decision Tree Learning

نویسندگان

  • Mark Last
  • Michael Roizman
چکیده

Most decision-tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look-ahead should not be used due to a negative effect (called ―decision tree pathology‖) on the decision tree accuracy. This paper presents a new look-ahead heuristics for decision-tree induction. The proposed method is called LA-J48 (―Look-Ahead J48‖) as it is based on J48, the Weka implementation of the popular C4.5 algorithm. At each tree node, the LA-J48 algorithm applies the look-ahead procedure of bounded depth only to attributes that are not statistically distinguishable from the best attribute chosen by the greedy approach of C4.5. A bootstrap process is used for estimating the standard deviation of splitting criteria with unknown probability distribution. Based on a separate validation set, the attribute producing the most accurate subtree is chosen for the next step of the algorithm. In experiments on 20 benchmark datasets, the proposed look-ahead method outperforms the greedy J48 algorithm with the Gain Ratio and the Gini Index splitting criteria, thus avoiding the look-ahead pathology of decision tree induction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Look-ahead based fuzzy decision tree induction

Decision tree induction is typically based on a top-down greedy algorithm that makes locally optimal decisions at each node. Due to the greedy and local nature of the decisions made at each node, there is considerable possibility of instances at the node being split along branches such that instances along some or all of the branches require a large number of additional nodes for classification...

متن کامل

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

A novel hybrid method for vocal fold pathology diagnosis based on russian language

In this paper, first, an initial feature vector for vocal fold pathology diagnosis is proposed. Then, for optimizing the initial feature vector, a genetic algorithm is proposed. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different classifiers (ensemble of decision tree, discriminant analysis and K-nearest neig...

متن کامل

دسته‌بندی داده‌های دورده‌ای با ابرمستطیل موازی محورهای مختصات

One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised algorithms is available such as decision tress, SVM, and KNN methods. In this paper we focus on decision tree algorithms. When we ...

متن کامل

On Using Linear Diophantine Equations to Tune the extent of Look Ahead while Hiding Decision Tree Rules

This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach for hiding sensitive classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques which restrict the usability of the data since the raw data itself is readi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Intell. Syst.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2013